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A random walk on image patches * 

Kye M. Taylor ^ and Francois G. Meyer -f 

Abstract. In this paper we address the problem of understanding the success of algorithms that organize 

patches according to graph-based metrics. Algorithms that analyze patches extracted from images 

or time series have led to state-of-the art techniques for classification, denoising, and the study of 

,__( nonlinear dynamics. The main contribution of this work is to provide a theoretical explanation for 

y—{ the above experimental observations. Our approach relies on a detailed analysis of the commute 

f^ time metric on prototypical graph models that epitomize the geometry observed in general patch 

04 graphs. We prove that a parametrization of the graph based on commute times shrinks the mutual 

,—1 distances between patches that correspond to rapid local changes in the signal, while the distances 

3 between patches that correspond to slow local changes expand. In effect, our results explain why the 

' parametrization of the set of patches based on the eigenfunctions of the Laplacian can concentrate 

Cn patches that correspond to rapid local changes, which would otherwise be shattered in the space 

of patches. While our results are based on a large sample analysis, numerical experimentations on 

y-H synthetic and real data indicate that the results hold for datasets that are very small in practice. 
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^. 
Q 1. Introduction. 

c/j Problem statement and motivation. In this paper we address the problem of understanding 

^^ the success of algorithms that organize patches according to graph-based metrics. Patches 

are local portions, or snippets, of a signal or an image. The set of patches can be organized 
by constructing a graph that connects patches that are similar. Indeed, it is reasonably 
,__! straightforward to measure the similarity between patches that are alike. The graph can then 

I> be used to extend the notion of similarity to patches that are very different. For instance, one 

''^ can measure the distance between two visually different patches by computing the number 

■r-l- of edges of the shortest path (geodesic) connecting them. In this work we explore a distance 

C^ defined by the commute time associated with a random walk defined on the graph. 

t~^ Algorithms that analyze patch data using graph-based metrics have led to state-of-the art 

"^ techniques for classification [36, 41], denoising [5, 7, 22, 24, 33, 37, 38], and studying dynamics 

,_^ [4, 25, 43]. The graph provides a new perspective from which to analyze the similarities be- 

ILJ tween patches, and consequently, the local signal or image content they contain. For example, 

in [4], properties of the graph's geometry, such as the distribution of clustering-coefficients and 
the average geodesic distance between two vertices, are used to separate chaos and noise, or 
C^ different types of chaos. In [36, 37, 38, 41], the geometry of the graph is analyzed by studying 

a random walk on it. Specifically, the diffusion distance [14] (or spectral distance [32]), and the 
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commute tim,e distance [6] (which is equivalent to the resistance distance [21]) are two related 
graph metrics that are derived from the random walk, and that can be used to parametrize 
the graph's geometry. These metrics can be used to efficiently organize patches in a manner 
that reveals the local behavior of the associated signal or image. In our previous work [36, 41], 
we have noticed that metrics based on a diffusion, or a random walk concentrate patches that 
contain rapid changes in the signal or image data. These patches contain changes associated 
with singularities (edges), rapid changes in frequency (textures, oscillations), or energetic 
transients contained in the underlying function. Furthermore, patches that contain only the 
smooth parts of the image are more spread out according to such graph metrics. 

Outline of our approach and results. The main contribution of this work is to provide 
a theoretical explanation for the above experimental observations. Our approach relies on a 
detailed analysis of the commute time metric on prototypical graph models that epitomize the 
geometry observed in general patch-graphs. We assume that the set of patches is composed of 
two broad classes: patches within which the function varies smoothly and slowly, and patches 
where the function exhibits anomalies: singularities, very rapid change in local frequency, etc. 
We prove that a parametrization of the graph based on commute times shrinks the mutual 
distances between vertices that correspond to rapid local changes relative to the distances 
between vertices that correspond to slow local changes. In effect, our results explain why 
the parametrization of the set of patches based on the eigenfunctions of the Laplacian [37, 
38] can concentrate anomalous patches, which would otherwise be shattered in the space of 
patches. This concentration phenomenon can then be exploited for further processing of the 
patches (e.g. denoising, classification, etc). While our results are based on a large sample 
analysis, numerical experimentations on synthetic and real data indicate that the results hold 
for datasets that are very small in practice. 

Organization. This paper is organized as follows. In the next section, we describe the 
patch-based representation of a signal, and the associated patch-graph. We develop some 
intuition about the graph of patches by studying several examples in section 3. In section 4, 
we describe the embedding of the graph of patches based on commute time. The prototypical 
graph models that allow us to study the parametrization are defined in section 5. The main 
theoretical result about the embedding of the graph models are presented in section 5.3. 
Numerical experiments confirming our theoretical analysis are presented in section 6. We 
finish with a discussion in section 7. 

2. Preliminaries and Notation. For simplicity and without loss of generality, we assume 
that the signal of interest is formed by a sequence of samples, {xn}n=i- Because we want to 
extract N = N' — (d — 1) patches from this sequence, we need d extra samples at the end 
(hence the N' samples). We first define the notion of a patch. 

Definition 2.1. We define a patch as a vector in M formed by a subsequence of d contiguous 
samples extracted from the sequence {xn}n=ij 

x„ = [xn Xn+1 ■ ■ . Xn+(d-i)] , for n = 1,2, ..., N. (2.1) 



As we collect all the patches, we form the patch-set in 
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Definition 2.2. The patch-set is defined as the set of patches extracted from the sequence 

\XnSn=l' 

patch-set = {x„, n = 1,2,..., A^}. (2.2) 

A main objective of this paper is to understand the organization of the patch-set and relate 
this organization to the presence of local changes in the signal or the image. We note that the 
concept of patch is related to the concept of time-delay embedding. Specifically, if the sequence 
comprises measurements of a dynamical system, then Taken's embedding theorem [35, 39] 
allows us to replace the unknown phase space of the dynamical system with a topologically 
equivalent phase space formed by the patch-set (2.2). While in this work we do not assume 
that the sequence {xn} is an observable of a dynamical system, we are nevertheless interested 
in a similar goal: the organization of patches in M . 

Throughout this paper, we think about a patch, x„, in several different ways. Originally, 
x„ is simply a snippet of the time series. Then, we think about x^ as a point in M . Later, 
we also regard x„ as a vertex of a graph. Keeping these three perspectives in mind is critical 
to our approach and understanding. 

In order to study the discrete structure formed by the patch-set (2.2), we connect patches 
together (using their nearest neighbors) and define a graph (or network) that we call the 
patch-graph. 

Definition 2.3. The patch-graph, T, is a weighted graph defined as follows. 

1. The vertices of T are the patches {x„, n = 1, . . . , A'"}. 

2. Each vertex x„ is connected to its u nearest neighbors using the metric 

I l-X.nl I ll^mll 

3. The weight Wn,m along the edge {xn,Xm} is given by 

{g-p {'^n,^m)/o- j^j:^^ j^g connected to x^, /„ .n 

otherwise. 

The edges of the patch- graph encode the similarities between its A^ vertices. We work 
with the metric p (defined in (2.3)) because it is not sensitive to changes in the local energy of 
the signal (measured by ||x„||). The metric p allows us to detect changes in the signal's local 
frequency content, or local smoothness. The parameter a controls the scaling of the similarity 
/9(x„,Xm) between x„ and x^ when defining the edge weight Wn,m- In particular, Wn,m will 
drop rapidly to zero as />(xn,Xm) becomes larger than a. 

An important remark about the way we measure distances on the graph is in order here. 
We use p to define the graph topology defined by the edges: which patch is connected to which 
patch. This is appropriate since we can compare patches that are similar using p (e.g. two 
patches containing the same edge, but at different locations). On the other hand, as explained 
in section 4.2, we use the commute time to analyze the global geometry of the patch-graph. 
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Figure 1. A, B, C: time series composed of N' = 2072 sam,ples. The color of the signals A and B encodes 
the local variance (large — red, low — blue). C: seismogram; the color indicates the temporal proximity to a 
seismic arrival, identified by vertical black lines. See text for more details. 




Figure 2. D, E, and F: image of size 128 x 128, 128 x 128, and 240 x 240 pixels respectively. The color of 
the pixel at the center of each patch encodes the local variance of the image intensity. 

Indeed, the distance defined by p becomes useless when we need to compare very different 
patches (e.g. a patch of a uniform region vs a patch that contains an edge). As explained in 
section 4.2, the global organization of the patches can be discovered by studying the speed at 
which a random walk propagates along the graph (via hitting times). 

Finally, we note that the weighted graph is fully characterized by its weight matrix. 



Definition 2.4. The weight matrix W is the N x N matrix with entries W„^„ 
degree matrix is the N x N diagonal matrix D with entries D„^„ = Y2i=i '^n,i- 



Wr. 



The 



3. Warm up: A first look at the patch-set. The goal of this section is to provide the 
reader with some intuition about the geometry of the patch-set and the associated patch- 
graph. This will help us motivate our graph models and the analysis of their geometry. At 
the end of the section, we provide a sketch of our plan of attack. 
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3.1. Examples of signals and images. We construct the patch-set associated with some 
examples of signals and images. Because it is not practical to visualize the patch-set in M 
when d = 25, we display the projection of the patch-set onto the three-dimensional space that 
captures the largest variance in the patch-set (computed using principal component analysis). 
Figure 1 displays three signals {xn}, n = 1, . . . , N' , with A'^' = 2072. Patches of size d = 25 
samples are extracted around each time sample, which results in the maximum overlap between 
patches. Signal A is a chirp, signal B is a row of the image Lenna (shown in Fig. 2-D), and 
signal C is a seismogram [41]. 

In order to quantify the local regularity of signals A and B, we compute the variance 
over each patch, and color the curve according to the magnitude of the local variance: hot 
(red) for large variance and cold (blue) for low variance. The color of signal C encodes the 
temporal proximity to the arrival of a seismic wave associated with an earthquake: hot color 
indicates close proximity, while cold corresponds to baseline activity. Identifying arrival-times 
is necessary for purposes such as locating an earthquake's epicenter. This example illustrates 
the application of the present work to the problem of detecting seismic waves [41]. 

Figure 2 displays three images. We extract patches of size 5x5. Here, the patches are not 
maximally overlapping: we collect every third patch in the horizontal and vertical directions 
for images D and E, while we collect every fifth patch in each direction for image F. This 
results in patch-sets of size 42 x 42 for images D and E, and of size 48 x 48 for image F. As 
before, the color of a pixel in the images encodes the local variance within the patch centered 
at that pixel. 

3.2. Projections of the patch-sets. Figure 3 shows the projections of each of the six 
patch-sets. Distances in Figure 3 correspond to the normalized distance p. We observe that 
patches with high variance (red-orange) appear to be scattered all over M.'^. These patches 
correspond to regions where the image intensity varies rapidly. Patches with low variance 
(blue-green), which correspond to regions where the signal is smooth and varies very little, 
tend to be concentrated along one-dimensional curves (for time series) and two-dimensional 
surfaces (for images). These visual observations can be confirmed when computing the actual 
mutual distances between patches (data not shown). 

The organization of the patches in the patch-set can be explained using simple arguments. 
Let us assume that the sequence {xn} corresponds to the sampling of an underlying differ- 
entiable function x{t), and assume that x'{t), the derivative of x{t), remains small over the 
interval of interest. In this case, if two patches x„ and x^ overlap significantly - i.e. \n — m\ 
is small - then they will be close to one another in M . Indeed, the values of the coordinates 
of patches x„ and x^ will be very similar, since the signal x{t) varies slowly. In principle, 
if the sampling is fast enough, the patches should lie along a one-dimensional curve in M . 
By the same argument, when x{t) exhibits rapid changes, the magnitude of the derivative, 
|2;'(t)|, can be very large, and therefore temporally neighboring patches are not guaranteed 
to be spatial neighbors in M.'^. This argument allows us to understand the distribution of the 
patches in the signal B, or the image F. 

Instead of characterizing patches according to the local smoothness of the underlying 
function, we can also analyze the distribution of the patches according to the function's local 
frequency information. This will help us understand the structure of the patch-set for signal A. 
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Figure 3. Principal component analysis of patch-sets associated with the time series A-C and the images 
D-F. Each point represents a patch; the color encodes the variance within the patch (see Figures 1 and 2.) 



For this type of signal, it is appropriate to measure the distance between the normahzed 
patches, x„/||x„|| and Xm/||xm|| after computing the Fourier transforms (a simple rotation of 
M ) of the respective patches. This process is akin to the concept of time-frequency analysis. 
We expect that regions of the signal with little local frequency changes will cluster in W^: this 
is the case for the blue patches of the chirp A. On the contrary, when the local frequency 
content changes rapidly (as in the middle of the chirp A), the corresponding (red/orange) 
patches will be at a large distance of one another in M (see Figure 3- A). 

Finally, we can try to understand the organization of the patch-set for the seismogram C. 
Let us assume that {xn} is obtained by sampling a function of the form x{t) = h{t) + w{t), 
where w{t) represents a seismic wave and b{t) represents baseline activity. We can expect 
that w{t) is a rapidly oscillating transient with a rich frequency content, while h{t) is varying 
slowly. Now consider two patches x„ and x^. It can be shown that if both patches x„, and 
Xm are extracted from the baseline function, 6(t), and do not contain any part of the energetic 
transient, then their mutual distance is expected to be small. In addition, if x„ contains part 
of the energetic transient w{t) and x^ is extracted from the baseline 6(t), then their mutual 
distance is expected to be large. Finally, if x„ and x^, are composed of two different parts 
of w{t), then their mutual distance is also expected to be large (provided the patches are 
sufficiently long and w{t) oscillates sufficiently fast). More generally, one can expect that two 
patches extracted from two different energetic transients wi{t) and W2{t) will be at a large 
distance from one another [41]. 
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Figure 4. The weight matrices W associated with signals A-F are displayed as images: Wn,m is encoded as 
a grayscale value: from white (wn.m ~ 0) to black {wn.m ~ !)• Dark structures along the diagonal of the W 



matrix associated with the time series A-C indicate that patches that are close in time are also close in ] 



3.3. From the patch-set to the patch-graph: the weight matrix W. Having gained 
some understanding about the organization of the patch-set, we now move to the structure of 
the patch- graph and its weight matrix W. Figure 4 displays the weight matrices built from 
the patch-sets that correspond to the time series A-C (top) and the images D-F (bottom). 
Note that when processing time series A-C, the columns (or equivalently, the rows) of W can 
be identified with temporally-ordered time-samples. Therefore, a large main diagonal in the 
weight matrix correspond to patches that are close in time and also close in M'^. For instance, 
consider the time series A and its associated weight matrix. The dark bands near the top- 
left and bottom-right of the diagonal correspond to the slowly-varying oscillations near the 
beginning and end of the chirp (see Figure 1). Indeed, large entries in the diagonal of W is 
a direct consequence of relatively little variation in the time series. On the other hand, the 
columns of W corresponding to portions of the time series that exhibit rapid local changes 
(center of Figure 4-A) tend to lack such prominent diagonal structures. For such regions of 
the matrix W, the entries are no longer concentrated along the diagonal, and are shattered 
across all rows and columns (see the center of W in Figure 4-A; the columns correspond to 
the fastest oscillations at the center of the chirp). The large distances between these patches 
are also apparent in the lighter pixel intensities, representing relatively smaller edge-weights. 
Note that the patches extracted from the seismic data are very far apart, as indicated by the 
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much lighter shades of gray. It is more difficult to relate the ordering of an image's weight 
matrix to locations in the image itself. For the weight matrices associated with images D-F, 
the ordering of the columns is equivalent to the order in which the patches were collected 
from the image plane: first left-to-right, then top-to-bottom (similar to a raster scan, or how 
one would read pages of a book). Hence, periodically repeating dark blocks in the weight 
matrices associated with images D-F are indicative of image patches that are close in M and 
close in the image-plane as a result of relatively little change in the image's local content. For 
example, the dark square-like structure that appears near the main diagonal of W in Figure 
4-E, and which spans roughly one fifth of the number of columns, corresponds to the mirror's 
smooth, light border in image E. 

3.4. Summary of the experiments and our plan of attack. The experiments in sections 
3.1 and 3.3 highlight the fact that regions of an image, or of a signal, that contain anomalies 
(e.g. singularities, edges, rapid changes in the frequency content, etc.) are scattered all over 
the patch-set, making their detection and identification extremely difficult (see Figures 3- A 
and 3-F). In contrast, patches from smooth regions appear to cluster along low dimensional 
curves or surfaces. Because the anomalous patches are usually the most interesting ones, 
we need to find a new parametrization of the patch-set that concentrates the anomalies and 
separate them from the smooth baseline part of the image. The structure of W in the "rough 
regions" suggest that patches that contain anomalies appear to be very well connected (see 
the center of Figure 4-D, which corresponds to the boa on the hat of Lenna). This concept 
can be quantified by studying how fast a random walk would reach all patches in these rough 
regions of W, and suggest that we should consider studying the hitting times associated with 
a random walk on the patch-graph. In the next section we formalize this concept and propose 
a parametrization of the patch-set in terms of commute time. A theoretical analysis of this 
approach is provided in section 5 

4. Parametrizing the patch-graph. 

4.1. The fast and slow patches. We first introduce the concept oi fast and slow patches. 
We have noticed that patches that contain anomalies (discontinuities, edges, fast changes in 
frequency, etc.) in the original signal lead to regions of the matrix W where the nonzero 
entries are scattered all around. We call such patches fast patches because, as we will see in 
the following, a random walk will diffuse extremely fast in such regions of the patch-graph. 
Conversely smooth regions of the signal lead to slow patches that are associated with a small 
number of large entries in W, which are concentrated along the diagonal. A random walk 
initialized in the slow patch region of the patch-graph will diffuse very slowly. 

4.2. A better metric on the graph: the commute time. As explained previously, we 
propose to replace the Euclidean distance, which leads to the scattering of the fast patches 
seen in Figure 3 by a notion of distance that quantifies the speed at which a random walk 
diffuses on the patch-graph. We propose to use the commute time. Parametrizing the graph 
using its commute time distance is closely related to parametrizing the graph using its diffusion 
distance [14, 23] (see Section 4.2.2). Although the works [7, 37, 38] do not explicitly embed 
vertices of the patch-graph based on the diffusion distance, they also study a random walk 
on the patch-graph, and define the diffusion distance in terms of this walk. In these studies. 
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noise is removed by evolving the diffusion process for a small time. A detailed comparison 
of our approach with the seminal work of [37] is provided in section 7.3. We note that the 
notion of first-passage time associated with a diffusion (which is equivalent to the hitting time 
associated with a random walk) has been used extensively to characterize the geometry of 
complex networks, and random media (e.g. [3, 15] and references therein). It is therefore 
natural to analyze the patch-set with this distance. 

4.2.1. A random walk on the patch-graph. In order to define the commute time between 
two vertices, we first need to define a random walk on the graph. In our problem, the random 
walk does not correspond to a physical process, but will lead to a notion of global proximity 
between patches. We consider a first-order homogeneous Markov process, Z^, defined on the 
vertices of the patch-graph, F, and evolving with the transition probability matrix P given by 

n,m = Prob(Zfc+i = Xm|Zfc = X„) = ;^^ = . (4.1) 

2^1 Wn,l \Jn,n 

Consider a slow patch x„ extracted from a regular/smooth part of the signal. If the random 
walk starts at x„, then it can only travel along the low-dimensional structure that corresponds 
to the temporal neighbors of x„ (see e.g. Figure 3-A.) The existence of this narrow bottleneck 
is also visible in the W matrix (see Figure 4- A): a random walk initialized within the fat 
diagonal of the upper left corner of W (the low frequency part of the chirp) is trapped in this 
region of the matrix, and can only travel along this fat diagonal. As a result, it will take many 
steps for the random walk to reach another slow patch x^ if |n — m| is large. This notion can 
be quantified by computing the average hitting-time, h{xn,Xm), which measures the expected 
minimum number of steps that it takes for the random walk, started at vertex x^, to reach 
the vertex x^ [6] 

h{-Xn, Xm) = En min{j > : Zj = x.^}, 

where the expectation E„ is computed when the random walk is initialized at vertex x„, i.e. 
when Zq = Xn- The commute time [6]: provides a symmetric version of h, and is defined by 

K(x„,Xm) = /l(x„,Xm) + h{Xm,Xn). (4.2) 

4.2.2. Spectral representation of the commute time. When the random walk is re- 
versible and the graph is fully connected, the commute time can be expressed using the 
eigenvectors (j)i, . . . , (J)n of the symmetric matrix 

D-V2WD-V2 = Di/2pD-V2. 

The corresponding eigenvalues can be labeled such that — 1 < Xn < . . . < A2 < Ai = 1. Each 
eigenvector 0^ is a vector with N components, one for each vertex of the graph. Hence, we 
write 

(/>fc = [(/)fe(xi) (/>A:(X2) ... (pki^N)] , 

to emphasize the fact that we consider (p/. to be a function sampled on the vertices of F. The 
commute time can be expressed as 

K(x., X.) = f; -V (^ - ^) ' , (4.3) 



Xh 

k=2 " 
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where 7r„ = J2m=i Wn,m/ "^j i=i Wj^i is the stationary distribution associated with the transi- 
tion probability matrix P [27, 36]. 

4.2.3. The relationship to diffusion maps. The diffusion distance [14] between vertices 
Xm and x„, Dt{xm, x„), measures the distance between the transition probabihty distributions 
- computed at time t - of two random walks initialized at x„ and x^, l^;=i |P^/ — f*rn «l^- 
The diffusion distance can also be decomposed in terms of the eigenvectors (pk [14], 

where V = Ylm' n' '^m',n> is the volume of the graph. It follows that the commute time is a 
scaled sum of the squares of diffusion distances computed at all times, 

oo 
K(x^,Xn) = Vy^^Dl/2{yini,y^n)- (4.5) 

t=0 

The significance of this equation is that the commute time includes the short term evolution 
(t ~ 0) as well as the asymptotic regime (t — )■ oo) of the random walk. We will come back to 
this analysis in section 5.4. 

4.3. Parametrizing the patch-graph. Equation (4.3) suggests the following embedding 
^ of the patch-graph F into M^~^, 



^ : x^ 



1 r ~\T 

^ <^2(Xn) fe(Xn) 0jv(x„) 



/7r„ 



VT^XTi v^T^^Ai ••• VT=A]7 



n= 1,2, ...,iV. (4.6) 



If we agree to measure the distance on the graph T using the square root of the commute 
time, then the mutual Euclidean distance after embedding is equal to the original distance on 
the graph, 

||^(x„) - ^(x^)ll = Vk(x„,x„). (4.7) 

The result is a direct consequence of (4.4) and (4.5). Similar ideas were first proposed in 
[32] to embed manifolds and are the foundation of the parametrizations given in [2, 14]. In 
practice, we need not use all the A^ — 1 coordinates in the embedding defined by (4.6). Indeed, 
since \n < • • • < A2 < Ai, we have that ,_, -*-, < • • • < ,-} . < ,/ . , and therefore, if we 

can accept some approximation error, we can use only the first d' coordinates of ^. As we 
will see in section 5.4, this dimension reduction further improves the separation between slow 
patches and fast patches. In the remaining of the paper we will work with the embedding of 
r into W^' defined by 



$ : x^ 



^n 



02 (x„) </'d' + l(Xn) 

^1-^2 ■■■ yJl-K' + l 



(4.8) 



We note that we can always choose d' such that the embedding $ almost preserves the commute 
time, 

||$(x„) - $(x„)f « k(x„,x^). (4.9) 

In fact, our experiments indicate that this approximation holds for small values of d' . 
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Figure 5. Scatter plot of the patch-set shown in Figure 3 after parametriztng using $ tn (4-8), with d! — 3. 
The fast patches (red and orange) are now concentrated and have been lumped together. The slow patches 
(blue-green) remain aligned along curves (for time-series) and surfaces (for images). 



4.4. Examples (revisited). Figure 5 displays the embedding of the patch-sets associated 
with signals and images A-F using the map ^ (4.8), where d' = 3. The blue curve in Figure 5- 
A corresponds to the slow patches (low frequencies of the chirp) that are connected according 
to their temporal proximity. On the other hand, red and orange patches extracted from the 
high frequency part of the chirp are now concentrated in a relatively small region (compare to 
Figure 3- A) . Similar features are seen in the parametrizations of the patch-graphs associated 
with signals B-F. 

5. A model for the patch-graph and the analysis of its embedding. 

5.1. Our approach. The embedding of the patch-graph F defined by ^, in (4.8), should 
lead to a representation of the patch-set in M'^ where distances correspond to commute times 
measured along the graph before embedding. Our goal is to explain the concentration of 
the fast patches created by the embedding <^ (see e.g. Figure 5). Our approach is based on 
a theoretical analysis of a graph model that epitomizes the characteristic features observed 
in patch- graphs composed of a mixture of fast and slow patches. This model is composed 
of two subgraphs: a subgraph of slow patches, which are extracted from the smooth regions 
of the signal, and a subgraph of fast patches, which are extracted from the regions of the 
signal that contain singularities, changes in frequency, or energetic transients. We confirm our 
theoretical analysis with numerical experimentations using synthetic signals in section 6, and 
we demonstrate that our conclusions are in fact applicable to a larger class of patch-graphs. 
The graph models are introduced in section 5.2. Our theoretical analysis of the embedding of 
the graph models is given in section 5.3. We evaluate the performance of the embedding $ 
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when d! is small in section 5.4. 

5.2. The prototypical graph models. We define the graph models in terms of the nonzero 
entries in the associated weight matrix W. Without loss of generality, we assume that the 
number of vertices N is even. 

The slow graph model. The large entries in a weight matrix W of a patch-graph composed 
only of slow patches will have large entries when |?i — m,| is small^: temporal/spatial proximity 
implies proximity in patch-space (see e.g. Figure 4-A, top corner). We therefore define the 
slow graph model as follows. 

Definition S.l.The slow graph S{N,L) is a weighted graph composed of N vertices, 
xi, . . . ,Xjv. The weight on the edge {x„,Xm} is defined by 

, ^„ if \n — m\ < L, , , 

Wn,m={ for l<n,m<N and 2L + l<N. (5.1) 

otherwise, 

The weight Wg is a positive real number that models the distance between two temporally 
adjacent patches. The parameter L characterizes the thickness of the diagonal in W. The 
slow graph is fully connected and each vertex has at most 2L neighbors, not including self- 
connections (see Figure 6). Hence, we require that 2L + 1 < A^. Finally, note that the slow 
graph is distinct from a regular ring, since the first and last vertices are not connected. We 
do not consider a regular ring since it would imply that the underlying signal is periodic. 

The fast graph model. We now consider the model for a patch-graph built from a patch-set 
comprising only fast patches. As demonstrated in section 3, most of the entries in W have 
similar sizes, and appear to be scattered throughout the matrix: temporal/spatial proximity 
does not correlate with proximity in patch space. In fact, fast patches are all far away from 
one another. We therefore define the fast graph model as follows. 

Definition 5.2. The fast graph J^(N,p) is a random weighted graph composed of N vertices, 
xi, . . . ,xjv. The weight on the edge {x„,Xm} is defined by 

iwjr with probability p, 

Wn,m = Wm,n = \ if 1 < n < m < 1\ , 

lO with probability I — p 
and 

Wn,m = 1 if n = m. 

The weight wp is a positive real number that models the distance between two fast patches. 
The fast graph model is equivalent to a weighted version of the Erdos-Renyi graph model [19], 
except that T{N,p) contains self-connections and has edge weights possibly less than one. 
The parameter p controls the density of the edges; p = 1 corresponds to a fully connected 
graph (clique). 

^We assume that the rows/columns of W are ordered according to increasing index n of the sequence {a;„}. 
This assumption does not affect the graph's parametrization nor our theoretical conclusions, but allows us to 
interpret the structure in W. 
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Figure 6. The fused graph model T*{N) is composed of a slow graph S{N/2,L) (blue) and a fast graph 
J-{N/2,p) (orange), connected by random edges (green). 




Figure 7. The weight matrix W of the fused graph model r*(256) is displayed as an image: Wn^m is 
encoded as a grayscale value: from white (wn,m = 0) to black {wn,m. = !)■ The entries o/W associated with 
the slow graph appear m the upper-left quadrant of W. Entries associated with the fast graph appear in the 
lower right quadrant. Random edges between the fast graph and slow graph appear in the upper right and lower 
left quadrants. 

The fused graph model. The fused graph model exemplifies the patch-set associated with a 
signal, or an image, which exhibits regions of fast and slow changes. The fused graph combines 
a slow and a fast subgraph of equal size (see Figure 6) . 

Definition 5.3. The fused graph r*(A^) is a weighted graph composed of a slow subgraph 
S{N/2,L) and a fast subgraph J^i^N / 2, p). In addition, edges between S{N/2, L) andT{N/2,p) 
are created randomly and independently with probability q and assigned the edge weight Wc > 0. 



Edges between S{N/2,L) and J-'{N/2,p) ensure that r*(A^) is connected (a requirement 
for the validity of the parametrization (4.6)). These edges allow us to model patches that are 
extracted from regions of the image that combine edges/transients and smooth intensity. If 
q is so small that no edges are created between the two subgraphs, then an edge is placed at 
random between the two subgraphs to ensure that the final fused graph is connected. 

The true patch- graph is always constructed using a v nearest neighbor rule (see section 2): 
each patch is connected to v other patches. In order to mimic a true patch-graph, we adjust 
the thickness L of the slow subgraph to the density of the edge connection, p, in the fast 
subgraph, so that on average, each vertex in the fused graph is connected to 2L vertices. We 
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know that the number of edges between distinct vertices in T{N,p) is a binomial random 
variable with expectation ^ ^ ' p. Since the total number of edges between distinct vertices 
of S{N, L) is equal to^ 

J:{N-j) = NL-'^^^, (5.2) 



we choose 



2L L{L + l) 

^~W^~ N{N-iy ^^-^^ 



This choice of p guarantees that the expected number of edges in F{N^ p) is equal to the 
number of edges in S{N,L). Furthermore, provided that L = 0{ln{N)), a short computation 
shows that, for large values of N, this choice of p also ensures that the expected degree of 
a vertex in F{N,p) is equal to the average degree of a vertex in S{N,L). Figure 7 shows 
the nonzero entries in the weight matrix associated with one realization of the fused graph 
model using parameters A^ = 256, L = [21nA^] = 12 and q = j^. Vertices x„ with n < 128 
are only connected to other vertices Xm if \n — m\ < L. This connectivity mimics the spatial 
(temporal) connectivity present in the smooth parts of an image (signal). 

5.3. The main result. Our goal is to understand the effect of the embedding $ defined 
by (4.8) on the fused graph. It turns out that studying the embedding of each individual 
subgraph (slow and fast) separately is much more tractable than considering the entire fused 
graph. To complement our theoretical study of the fast and the slow subgraphs, we provide 
numerical evidence in sections 5.4 that indicates that our understanding of the embedding 
of the subgraphs can be used to analyze the embedding of the fused graph. In section 6, 
we confirm that our theoretical analysis can be applied to true patch-graphs. Instead of 
studying $ directly, we take advantage of the fact that the embedding <I> almost preserves 
the commute time (see (4.9)). We can therefore understand the effect of the embedding on 
the distribution of mutual distances ||<I>(x„) — $(xm)|| within a subgraph by studying the 
distribution of the commute times K(x„,Xm) on that subgraph. While it would appear that it 
is a straightforward affair to compute the commute time on the slow graph, the computation 
becomes rapidly intractable. For this reason we provide lower and upper bounds for the 
average commute time on the slow and fast subgraphs, respectively. This is sufficient for our 
needs, since the two bounds rapidly separate even for low values of N. To estimate these 
bounds, we rely on the connection between commute times on a graph and effective resistance 
on the corresponding electrical network [10, 16]. Specifically, we map a graph to an electrical 
circuit as follows: each edge with weight Wn,m becomes a resistor with resistance l/wn,m- The 
vertices of the graph are the connections in the circuit. Given two vertices, x„ and x^ in the 
circuit, one can compute the eff'ective resistance between these nodes, Rn,m- The key result 
[10] is that K{xn,Xm) = VRn^m, where V is the volume of the graph. 

Before stating the main Lemma, let us take a moment to compute some rough estimates 
of the commute times on the slow and fast graphs. To get some quick answers, we consider 
the simplest versions of the two graph models. When L = 1, the slow graph S{N, 1) is a path 

■^This is equivalent to the number of entries along the first L upper diagonals of the matrix W. 
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with self-connections. On a path of A^ vertices without self-connections, the commute time 
between vertex x„ and x^ is equal to 2{N — l)\m — n\. Therefore, the average commute time 
(computed over all pairs of vertices) on a path of length A'^ is O(iV^). While it would make 
sense that adding edges to a path should decrease the commute time, this is usually not true 
[27]. Nevertheless, the presence of edges that allow the random walk to move forward by a 
distance L at each time step lead us to conjecture that the average commute time on S{N, L) 
should be of the order -j-0{N'^). In fact, as we will see in Lemma 5.6, the average commute 
time of the slow graph is of the order j^O{N'^). With regard to the fast graph, we can analyze 
the case where the density of edges p = 1. In this case, the fast graph J^{N, 1) is a complete 
graph, or clique, and every vertex is connected to every other vertex. In a complete graph, the 
average commute time is 0{N). Since the fast graph can be regarded as a complete graph 
whose edges have been removed with probability 1 — p, we expect the commute time to be 
slightly larger than 0{N), since removing edges restricts the random walker's options to get 
from one vertex to another. Again, in agreement with our intuition. Lemma 5.6 asserts that 
in the fast graph, the commute time is of the order of [Lln(A'^)/ln(L)]0(A^). 

We are now ready to state the main lemma. Our results will be stated in terms of the 
"average behavior" of the commute time on each graph, a concept that we need to define 
properly. In the case of the slow graph, which is deterministic, we consider the average com- 
mute time computed over all pairs of vertices. 

Definition SA.Let ns ^e the average commute time between vertices in the slow graph 

S{N,L) 

2 
N(N^T) ^ '^(^-^-)- (5-4) 



A 



In the case of the fast graph, the "average behavior" of the commute time needs to be 
defined more carefully. Indeed, each fast graph is a realization of a stochastic process, and 
therefore we need to consider the expectation of the commute time. More precisely, given a 
realization, T, of a fast graph, we compute the expected commute time Ex„,xm ['^1-^] ^^ the 
expectation of K(xm,x„) over all possible random assignment of the vertices x„ and x^. We 
then need to consider how Ex„,xm [i^\J^] varies as a function of J-". Therefore, we compute a 
second expectation over all possible random graphs J-". 

Definition 5.5. The expected commute time kjt on a fast graph J- generated according to 
(5.2) is defined by 

K^^E^[Ex„,x^[Km]. (5.5) 

where the inner expectation is computed over all random assignments of the vertices x„,Xm 
given a realization T of a fast graph geometry, and the outer expectation is computed over all 
possible realizations T of the fast graph. 



Lemma 5.6. We have 



{N{2L + 1) - L{L + 1)) ^^^^ < ^s. (5.6) 
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We also have 

provided that, for all assignments of the vertices x^ and x„, and for all fast graphs T , the 
covariance Cov{M, Rm,n) between the number of edges, M, and the effective resistance, Rm,n, 
of the associated electrical circuit is nonpositive. 
Proof. The proofs are given in appendix B. ■ 

Because L needs to grow logarithmically with A'^ (to ensure that J-'{N, p) is connected with 
high probability; see appendix A), we choose L = cln A^ for some c > 1, and the upper bound 
on njr is negligible relative to the lower bound on k^, when A*" is large. 

Corollary b.7. Assume that L = clnN for some constant c > 1. It follows that, as N ^ oo, 
the lower bound on Kg grows like (jj^jy) ; and the upper bound on kjt grows like y^^^J ■ 
Furthermore, the lower bound on Kg grows faster than the upper bound on Kjr, and so with a 
probability that approaches one as N ^ oo, 

lim — = 0. 

Proof. Notice that ks is bounded away from zero. Because the choice of L guarantees that 
the fast graph is connected with a probability approaching one, kj: is finite with probability 
approaching one. Therefore the ratio nj^/ ks is bounded below by zero and from above by a 
ratio of the bounds from Lemma 5.6. The ratio of bounds goes to zero, which follows from a 
simple, but lengthy, limit calculation. ■ 

We can translate the corollary in terms of the mutual distance between vertices of the 
subgraphs after the embedding $: ^{J^{N,p)) will be more concentrated than ^[S{N,L)). 

5.4. Spectral decomposition of commute times on the graph models. The results of 
section 5.3 apply to the exact commute times on the graph models. However, as mentioned in 
section 4.3, it is more practical to use a truncated version of the spectral expansion of the com- 
mute time, defined by Equation (4.3). We also noticed that the commute time encompasses 
the short term evolution [t ~ 0) as well as the asymptotic regime (t — )• oo) of the behavior of 
the random walk. Neglecting eigenvalues (j)^ for large k emphasizes the long term behavior of 
the random walk, and we expect that it should further increase the difference between the slow 
and fast graphs. In this section, we confirm experimentally that approximating the commute 
times by truncating the expansion (4.3) actually emphasizes the separation between the fast 
subgraph and the slow subgraph in the fused graph model. In all the numerical experiments 
in this section, unless otherwise stated, we fix A^ = 1024, L = \2 In(A^)] , p is chosen according 
to (5.3), q = 1/N, and ws = wjr = Wc = I. In all experiments, we compute the eigenvalues 
{Afc} of the matrix D~^"WD^"'^" associated with the fast graph, the slow graph, and the 
fused graph. 
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Slow and fast subgraphs: two different dynamics revealed by the spectral decomposition. We 
first provide a back-of-the-envelope computation of the spectrum of the slow and fast graphs. 
As we have noticed before, the slow graph model is a "fat" path. We know that the spectrum 
of a path without self-connections [11] is given by 

cos[TT{k-l)/{N -1)], k = l,2,...,N. 

We expect therefore that the eigenvalues associated with the slow graph will decay slowly away 
from one for small k. Figure 8 (inset) displays the eigenvalues associated with the slow graph 
model. As expected, the spectrum is flat around k = and exhibits the slowest decay of all 
the graph models. We use the similarity between the fast graph model and the Erdos-Renyi 
graph to predict the spectrum of the fast graph. Except for Ai = 1, all the other eigenvalues 
of an Erdos-Renyi graph asymptotically follow the Wigner semicircle distribution [12]. Our 
numerical experiments confirm this prediction: as shown in Figure 8-right, the eigenvalues of 
the fast graph appear to be distributed along a semicircle. 

The decay of the spectrum has a direct influence on the dynamics of the random walk. 
Specifically, the spectral gap controls the mixing rate, which measures the expected number 
of time-steps that are necessary to reduce the distance between the probability distribution 
after t steps Pn/m and the stationary distribution vTm by a certain factor [42] . This concept is 
justified by the fact that the convergence of Pn/m is exponential [17], and is given by 



max 



^ n,m -. 



VTr, 



^Knax^ t=l,... (5.8) 



where Xmax = max{A2, [AatI} (which is related to the spectral gap), and -Kmin is the smallest 
entry of the stationary distribution. Since A2 is much larger in the slow graph than in the fast 
graph, we expect that convergence to the associated stationary distribution will take longer 
on the slow graph than on the fast graph. 

The dynamic of the fused graph is enslaved by the slow subgraph. We now consider a random 
walk on the fused graph. If this random walk begins at x„ in the fast subgraph of the 
fused graph, then after a small number of steps, to, the probability of finding the random 
walker at any other vertex x^, in the fast subgraph is close to the stationary distribution, 
^nm ~ '^rn- On the other hand, during the same amount of steps, a random walk initialized 
in the slow subgraph will only explore a small section of the slow subgraph, and consequently, 
the transition probabilities will still be similar to its initial values Pn,m ~ Pn,m- As a result, 
the restriction imposed by the geometry of the slow subgraph is expected to decrease the 
convergence rate of the transition probabilities on the fused graph. We confirm this analysis 
with experimental results. Figure 8 (inset) shows that for k < 23 the eigenvalues associated 
with the fused graph and the eigenvalues associated with the slow graph exhibit slow decay 
away from one, thereby increasing the convergence rate given in (5.8). For 25 < k < 400, the 
eigenvalues of the fused graph decay at a rate similar to that of the fast graph. Finally, for 
k > 400 the eigenvalues of the fused graph join those of the slow graph (see also the histogram 
in Figure 8-right). We have observed experimentally that these transitions in the behavior of 
the spectrum of the fused graph are not affected by varying the parameters N, L, and q. We 
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Figure 8. The eigenvalues Xk of the matrix D ^ "WD ^ " associated with the fused (green), slow (blue), 
and fast (orange) graphs. Left: Xk as a function of k; right: histogram of the Xk- 



Slow graph 



Fast graph 



Fused graph 




/T7YT 



Mwli«lMfMi|r#l 


^W%1*»**ftr 




#lWrt*^W*lf^ 




lylf^ j [— — -fff^^ 



Figure 9. The eigenvectors {^i, 02, 08, 0i6, 032} associated with the slow (left), fast (center), and fused 
(right) graphs. Right: the large amplitude of the eigenvectors 4>k on the first half of vertices (blue) belonging to 
the slow subgraph leads to a larger separation between the fast and slow subgraphs when truncating the commute 
time expansion. 



conclude that the slow subgraph has the largest influence on the first few (small k) eigenvalues 
Afc of the fused graph. 

The eigenvectors of the fused graph and their impact on the conrinriute time. The transition 
exhibited in the spectrum of the fused graph can also be detected in the corresponding eigen- 
vectors (pk- Figure 9 shows the eigenvectors {4'i,4'2,4'8,4'i6,4>32} corresponding to the three 
graph models. The first eigenvector cpi has entries equal to the square root of the stationary 
distribution, (/)i(x„) = y^vf^, and is not used in the expansion of the commute time (4.3). 
As expected, the random walk spends most of its time inside the slow subgraph of the fused 
graph, as indicated by the larger values of (pi for the first (blue) N/2 vertices (see Figure 
9-right). The eigenvectors {(/)2, (/>8, (pw} of the fused graph exhibit large amplitude oscillations 
over the vertices belonging to the slow subgraph (first half ~ shown in blue - of the plots in 
Figure 9-right), which resemble those found in the eigenvectors associated with the slow graph 
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(Figure 9-left). As k increases, the eigenvectors (pk of the fused graph become more and more 
similar to the eigenvectors of the fast graph. 

The impact of the eigenvectors (pk on the commute time on the fused graph can be analyzed 
by estimating the size of the terms 



1 f(t>k{^n) (pki^mV 



I — Xk \ xHW,. ^/viV, 



(5.9) 



in the spectral expansion (4.3) of the commute time k. We claim that k(x„, x^) will be small 
if both vertices x„ and x^ are in the fast subgraph, and that k will be large if either vertex 
is in the slow subgraph. 

We can first estimate the size of 0fc(x„)/y^ — (j)k{;x.rn) / \/Thn- We observe that the eigen- 
vectors (j)k for small values of k have large amplitude oscillations on vertices belonging to the 
slow subgraph, but are relatively constant on the fast subgraph (see Figure 9-right). There- 
fore, for small values of /c, each term (5.9) will be small when x^ and Xm both belong to 
the fast subgraph (we also have 7r„ ss vTm when two vertices belong to the same subgraph). 
Conversely, these terms will be large when either x„ or x^ belongs to the slow subgraph. 
While this analysis of the size of the terms (5.9) only holds for small values of A;, it turns out 
that these are the terms that have the largest influence in the expansion of the commute time 
(4.3). Indeed, the spectrum of the fused graph decays slowly, and therefore the first few coef- 
ficients (1 — Ajt)~^ in the commute time expansion (4.3) are much larger than the remainders, 
and therefore the terms (5.9) for small values of k will provide the largest contribution in the 
expansion of the commute time. 

We conclude that «;(x„,Xm) is small when x„ and Xm belong to the fast subgraph, and 
K(x„,Xm) is large when either vertex is in the slow subgraph. Furthermore, we expect that 
this difference will be further magnified if we replace the exact expansion of k in (4.3) by an 
approximation that only includes the first few values of k. 

The truncated spectral expansion of the commute time increases the contrast between the slow 
and fast subgraphs. We finally come to the heart of the section: the numerical computation 
of the average approximate commute time defined by 

Because of (4.7), we expect that k' will be close to the true commute time k. We compute k' 
for the three graphs: slow, fast and fused. We generated 25 realizations of the fast and fused 
graphs, and we estimated the expected commute time with the sample mean, given by k' in 
(5.10). 

Figure 10-A displays k'jt/k'^ as a function of the number of terms d' used in the embedding 
(4.8), for several values of the number of vertices N, for the slow and fast graphs. Our 
theoretical analysis of hj^/ks, performed in Corollary 5.7, is only valid for large values of A^. 
Nevertheless, our numerical simulations indicate that for very low values of A^, kjt is already 
smaller than kj^, since all ratios are below one (see Figure 10-A). Furthermore, we see that 
this ratio is even smaller for smaller values of d' . We observe similar results when the commute 
times k'^ and k'jt are computed within the slow the fast subgraphs of the fused graph (see 
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Figure 10. k'jt/k's as a function of the dimension d! of the embedding <I>, for several values of the number 
of vertices N . Left: slow S and fast J- graphs separately; right: slow and fast subgraphs in the fused graph T* . 
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Figure 11. Histogram of k,' . Left: slow graph S and fast graph T . Right: k,' for the three types of transition 
between the subgraphs of the fused graph T* . Note the logarithmic scale on the horizontal axes. 
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Figure 12. k! as function of N . Left: slow graph S and fast graph T . Right: k! for the three types of 
transition between the subgraphs of the fused graph F* . 



Figure 10-B). These results confirm that the embedding <I> will further concentrate the vertices 
of the fast graph if d! is chosen to be much smaller than N . We have observed experimentally 
that choosing d! ~ In(A^) leads to the smallest ratio of averages, not only on the graph models, 
but also on the general patch-graphs studied in section 6. 
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The enslaving of the fused graph by the slow graph is clearly shown in Figure 11-B, 
where the normalized histogram of k' is shown for the three types of transition between the 
subgraphs of the fused graph F*: slow — )• slow, fast — )• fast, and slow — )• fast. The histogram 
of the slow — )• fast transition is very similar to the histogram of the slow — t- slow transition, 
clearly indicating that once the random walk is trapped in the slow subgraph, the presence 
of the fast subgraph does not help the random walk escape from the slow graph. We also 
notice that the average of k! for the fast — )• fast transition is roughly two orders of magnitude 
smaller than the average of k' for the slow — )• slow, or slow — ;■ fast transitions. In addition, the 
variance of each distribution is small enough to limit the overlap between the distributions. 

Figure 12 displays k! as a function of the number of vertices N , where d! = \\iN . Again, 
this result confirms that the asymptotic analysis of the ratio kj^/hs: performed in Corollary 
5.7, actually holds for very small values of A^. Indeed, whether the slow and fast graphs are 
considered separately (Figure 12- A), or are the components of the fused graph (Figure 12-B), 
the ratio kj^/hs ~^ (note the logarithmic scale). It is important to bear in mind that when 
analyzing images, N is typically of the order of 10^ and therefore our theoretical analysis will 
hold without any difficulty. Lastly, we again note in Figure 12-B that the transitions slow — )• 
fast in the fused graph have the same dynamics as the transition slow — )• slow. 

5.5. Summary of the experiments. We have confirmed experimentally that embedding 
the fused graph using $ shrinks the mutual distance between vertices of the fast subgraph, 
effectively concentrating these vertices closer to one another. As a result, the embedding helps 
divide the fused graph into the slow and the fast subgraphs by concentrating the vertices of 
the fast subgraph away from the vertices of the slow subgraph. Our analysis of the embedding 
is based on the fact that <& approximately preserves the commute time measured on the fused 
graph. Furthermore, we have demonstrated that a truncated version of the commute time, 
k' , is even more conducive to identifying vertices of the fast subgraph of the fused graph. 

The implication of these results is that the embedding of the true patch-graph F using <I> 
will concentrate the "anomalous" patches, which contain rapid changes in the signal, away 
from the baseline patches. This concentration of the fast anomalous patches happens for values 
of the embedding dimension d! that are of the order of ln(A): this choice of d' results in a 
low-dimensional embedding of the patch-graph. Because the fast patches are more clustered 
after embedding, their detection - for the purpose of detection of anomalies, classification, or 
segmentation - will become much easier. Finally, we note that our theoretical analysis can be 
extended to a more general context where patches are replaced by a vector of local features 
extracted from elements of a large dataset. The only requirement is that the graph of features 
exhibit a geometry similar to the fused graph F*. 

6. Numerical experiments with synthetic signals. In this section we validate our theo- 
retical results using synthetic signals. Each signal is the realization of a stochastic process 
with a prescribed autocorrelation function. We study two types of stochastic processes: one 
that generates signals that transition from low to high local frequency, and a second one that 
yields signals with varying local smoothness. We argue that these signals embody the types 
of local changes that are of fundamental importance in many areas of image processing. For 
both classes of signals, we embed the patch-sets using <I> in (4.8). We study the property of 
the embedding by quantifying the average commute time k', defined in (5.10) between fast 
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Figure 13. A realization of the time-frequency model. The low frequency portion (Ps ~ 8) is shown in 
blue; the high frequency portion (jSj: = 256^ is shown in orange. There are four subintervals (n = 3). 




Figure 14. A realization of the local regularity model. The smooth portion (Hs = 0.9^ is shown in blue; 
the irregular portion (Hj- = 0.3) is shown in orange. There are four subintervals (^ — 3). 



and slow patches, and we compare the numerical results with the theoretical predictions given 
in section 5. 

6.1. The signals. We consider two types of models: a time-frequency signal model and a 
local regularity signal model. Each model is characterized by an autocorrelation function. The 
autocorrelation function can be modified using a parameter that controls the local frequency, 
or the local regularity of the signal. We partition the interval [0, 1] into subintervals over 
which the autocorrelation parameter is kept constant. The parameter alternates between 
two different values creating subintervals of alternating local frequency, or alternating local 
regularity. The number of alternations is chosen randomly according to a homogeneous Poisson 
process with intensity /i: there are on the average /U + 1 subintervals. A simpler version of 
this model has been used in [13] to mimic the presence of edges in images. Unlike the model 
used in [13], we adjust the signal defined on each subinterval so that the result is continuous 
on [0, 1]. In all experiments that we report here we use fi = 3. The autocorrelation function 
associated with the time-frequency signal model is given by 



E{x{t)x{t + T) 



^^ l + cos(27rr) y 



1, 



where t ^ [0, 1), /3 > 0. As the autocorrelation parameter /3 increases, the range of frequencies 
present in the signal also increases. Figure 13 displays a realization of this model where the 
signal's covariance parameter alternates four times between /S^ = 8, and (3jr = 256. See 
appendix C for more on generating a signal from the time-frequency signal model. The 
autocorrelation function associated with the local regularity signal model is equal to that of 
fractional Brownian motion, given by 



K(x(Ti)x(r2)) = -(| 



,^ \2H . I 12^ 
n +\T2\ 



\T2 
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where H is the Hurst parameter. As H decreases, the local regularity decreases. A realization 
of this model is shown in Figure 14 where the signal's covariance parameter alternates four 
times between Hg = 0.9 and Hjr = 0.3. We use the method described in [1] to generate the 
fractional Brownian motion. 

6.2. Embedding the patch-graph. For each realization of a specific signal model, we 
construct a patch-set of A^ = 1024 maximally overlapping patches. The patch size is given by 
d = 32 for the time-frequency model, and d = 16 for the local regularity model. We compute 
the embedding <I> (4.8) and keep d' eigenvectors (pk- Figure 15 shows the patch-set associated 
with the realization of the time-frequency signal displayed in Figure 13 before (left) and 
after (right) embedding. The scatterplot before embedding is computed using the first three 
principal components. Figure 16 shows the patch-set associated with the realization of the 
local regularity signal displayed in Figure 14 before (left) and after (right) embedding. The fast 
patches of the time- frequency signal are the orange patches extracted from the high frequency 
segments. The slow patches are the blue patches extracted from the low frequency sections. 
Similarly, the fast patches of the local regularity signal are the orange patches extracted 
from the irregular segments, and the slow patches are the blue patches extracted from the 
smooth sections. For both signals, the fast patches are scattered across the space before 
embedding. After embedding, the fast patches are aligned along smooth curves. This visual 
impression is confirmed by computing the mutual distance between patches after embedding, 
||<I>(x„) — $(xm)||. In principle, we should report the value of the Lipschitz ratio 





Figure 15. Patch-set of the time-frequency signal (see Figure 13) before (left) and after (right) embedding. 
The color-code matches the color used in the plot of the signal: blue = low frequency, orange = high frequency. 




Figure 16. Patch-set of the local regularity signal (see Figure 14) before (left) and after (right) embedding. 
The color-code matches the color used in the plot of the signal: blue — smooth, orange — irregular. 
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Figure 17. \/k/ for slow (blue) and fast patches (orange) for the time-frequency model (left) and the local 
regularity model (right) as a function of the "roughness" of the fast patches. The slow patches were generated 
using Ps — 8 (left) and Hs = 0.9 (right). 



||<I>(x„) — <I>(xm,)||/||xn — Xmll to quantify the contraction experienced through the mapping <&. 
However, we have noticed that because the mutual distances ||x„ — x^H between fast patches 
is always large (as explained in section 3), the Lipschitz ratio ends up being always small 
for fast patches. Therefore studying the size of the Lipschitz ratio associated with <1> does 
not reveal whether the map concentrates the fast patches or not, but only indicates that the 
sampling of the fast patches (in the patch-set) is coarse. For this reason we prefer to study 
how ||<^(x„) — $(xm)|| varies for pairs of slow and fast patches. Based on our theoretical 
analysis, we expect that after the embedding the mutual distance between fast patches will 
becomes much shorter than the mutual distance between slow patches. 

We point out that the eigenvectors (j)k used in the embedding $ (4.8) are designed to 
have, on average, small gradients (as measured along edges of the graph). Indeed, these 
eigenvectors are also the eigenvectors of the graph Laplacian [11], and therefore minimize a 
Rayleigh ratio that quantifies the average norm of the gradient of 0^. Thus, if we further 
restricted our computation of the commute times inside each subset of fast and slow patches 
to only those patches that were connected by an edge in the graph, we would expect to see 
smaller values and little dependence on whether or not the patch was fast or slow. However, 
since our theoretical analysis of section 5 is based on the average commute time between all 
vertices belonging to the fast or slow graph models, we choose to compute the commute times 
between all patches, not just between patches that are connected with an edge. 

For each signal model, we compute the square root of the average approximate commute 
time 



N{N - 1) 



El 

n<m 



$(x„) - $(x^ 



3.3) 



for pair of patches x„,Xm that are either both fast, or both slow patches. We study how 
k' varies as a function of the autocorrelation parameter that controls how irregular the fast 
patches are. k' was computed using ten realizations of each signal model. The slow patches 
were generated using /3s = 8 and Hg = 0.9. As before, we used A'' = 1024 and d = 32 for the 
time-frequency model and d = 16 for the local regularity model. We observed that the overall 
shapes of the curves in (17) is invariant under variation of the parameters (as along as the ratio 
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of the patch length to the average subinterval length remains less than 10%). The dimension 
d! of the embedding used to compute k' was chosen so that (1 — A^)^^ < 0.1(1 — A2)^^, for 
all A; > d! + 1. Figure 17 shows vk' as a function of the frequency parameter (left), and 
smoothness parameter (right). We note that as the signal exhibits more rapid, local changes 
(increasing /3j-, or decreasing Hjr), the associated fast patches are increasingly concentrated 
(smaller ||$(x„) — $(xm)||) through the parametrization. These experiments confirm that the 
theoretical analysis can be applied to the true patch-set constructed from realistic signals. 

7. Discussion. Using realistic graph models, probabilistic arguments, and the connection 
between the commute time of random walks on graphs and the embedding (4.8), we provided 
a theoretical explanation for the success of the methods that analyze and process images 
based on graphs of patches. Our results establish that the embedding of the patch-graph of 
an image based on the commute time between vertices of the graph reveals the presence of 
patches containing rapid changes in the underlying signal or image by concentrating these 
patches close to one another while leaving the patches extracted from the slowly changing 
portions of the signal organized along low-dimensional structures. 

7.1. Parameter selection. 

7.1.1. Choosing the patch size. In this work we are interested in the local behavior of 
the image, and therefore d should remain of the order of what we consider to be the local 
scale. We also note that as d becomes large, the number of available patches (N/d) becomes 
smaller, making the estimation of the geometry of the patch-set more difficult, since patches 
now live in high-dimension. Another consequence of the "curse of dimensionality" is that the 
distance between patches becomes less informative for large values of d. If the original signal 
is oversampled with respect to the true physical processes at stake, then one can coarsen the 
sampling of the patch-set in the image domain. In practice, it would be more advisable to 
coarsen the underlying continuous patch-set, which is a nontrivial question. 

7.1.2. Choosing edge weights. In general, two principles guide the choice of edge weights 
in the patch-graph. On the one hand, patches that are very close should be connected with a 
large weight (short distance), while patches that are faraway should have a very small weight 
along their mutual edge. This principle is equivalent to the idea of only trusting local distances 
in M . Such a requirement is intuitively reasonable if we assume that the patch-set represents 
a discretization of a nonlinear manifold in M^. In this situation, we know that when the points 
on the manifold are very close to another, the geodesic distance is well approximated by the 
Euclidean distance. Conversely, because of the presence of curvature, the Euclidean distance 
is a poor approximation to the geodesic distance on the manifold when points are far apart. 
Because the only information available to us is the Euclidean distance between patches, we 
should not trust large Euclidean distances. 

On the other hand, as observed in Section 3, the fast patches, which contain rapid changes, 
are all very far apart (large p^(x„,Xm)). Therefore the probability that the random walk 
escapes the fast patch x„ and jumps to a different patch x^, which is given by 






-p (x„,Xm)/o- 

Si '^n,l ^l Wn,l 
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is always much smaller than the probability of staying at x„, which is given by 

1 



Y.I Wn,l ' 

In order to avoid that the random walk be trapped at each node x„, we "saturate" the 
distance function by choosing a to be very large. In this case, for all the nearest neighbors 
Xm of x„, we have Wn,m ^ Ij and the transition probability is the same for all the neighbors, 
Pn,m ^ l/'^- This choice of a promotes a very fast diffusion of the random walk locally. 
We note that choosing a large a may be avoided if self-connections are not enforced (i.e. 
'Wn,n = 0). However, self-connections are a necessary technical requirement to prove that the 
Markov process is aperiodic, which is required to prove the equality (4.4) [14]. 

We note that choosing a to be very large does not entirely obliterate the information 
provided by the mutual distance between patches, measured when patches are projected on 
the sphere with p{xn,Xm), (2.3). Indeed, p(x„,Xm) is used to select the nearest neighbors of 
each patch, and therefore allows us to define a notion of a local neighborhood around each 
patch. Choosing a to be very large forces a very fast diffusion within this neighborhood, 
irrespective of the actual distances /j(x„,Xm). Alternatively, we could consider choosing a to 
vary adaptively from one neighborhood to another. The parameter a could be small when 
patches are extremely close to one another, while a could be large when the patches are at a 
large mutual distance of one another. This notion is the foundation of the self-tuning weight 
matrix, which adjusts its weights based on a point's local neighborhood [30]. 

7.2. Extensions and generalizations. In general, the patch-set of an image consists of 
more than two homogeneous subsets. For example, one could partition an image patch- 
set into uniform patches, edge patches, and texture patches. Our experience [40] with a 
generalization of the time- frequency signal model (section 6) indicates that we can still separate 
the patches when the signal is composed of up to four different local behaviors (specified by 
four different values of the parameter in the autocorrelation function). Another extension 
of this work involves the embedding of a patch-set constructed from a library of images. 
Recent studies [26] indicate that high-contrast patches extracted from optical images organize 
themselves around 2-dimensional smooth sub-manifold ( [8] ) . This idea has also been exploited 
to construct dictionaries that lead to very sparse representations of images (e.g. [18], and 
references therein). Finally, we note that our results about the embedding of the slow (5.1), 
fast (5.2), and the fused graph (5.3) are very general and can be applied to datasets [9] where 
the corresponding graph exhibits a similar structure. For instance, one could imagine using 
this idea to study social networks, where the concept of cliques would correspond to fast 
subgraphs. 

7.3. Related work. The concept of patches has proven extremely useful in many areas 
of image analysis: texture analysis/synthesis [28], image completion [31, 44], super-resolution 
[34], and denoising [5, 7, 22, 24, 33, 37, 38, 44]. While these references do not explicitly 
construct a patch-graph, these works all compute distances between patches, and use the 
nearest neighbors of each patch to analyze and process patches. Recent works on the analysis 
of time-series also use patches and construct networks of patches [4, 25, 43]. All these references 
provide experimental evidence for the success of working on image (or signal) patches. 
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In this work, we provide a theoretical justification for this experimental success. We study 
the effect of the embedding $ (4.8) on the organization of the patch-set. Our analysis assumes 
that there exists a natural partition of the patch-set into two classes: patches extracted from 
the smooth baseline and patches that contain sudden local changes of the image intensity 
or signal value. It is interesting to compare and contrast our work to the work of Singer, 
Shkolnisky, and Nadler [37] who provide a different theoretical explanation for the success 
of patch-based denoising algorithms. The authors in [37] treat the matrix P as a filter, 
which acts on an A^-dimensional column-vector-representation of the signal or the image. 
Each multiplication of the probability distribution by P is interpreted as the evolution of the 
diffusion process on the patch-graph over a time-step of duration a. The results in [37] rely 
on the convergence of a properly normalized version of P toward the backward Fokker-Planck 
operator. The authors can compute the eigenfunctions of the operator when the signal is 
either a one-dimensional constant function perturbed by Gaussian noise, or a one-dimensional 
step function also contaminated by Gaussian noise. 

In contrast, our analysis is based on the analysis of the commute time on graphs that 
epitomize the patch-graph constructed from two classes of patches. In addition, we need not 
assume that the image is piece-wise constant. In fact, our experiments demonstrate that 
our analysis can be applied to detect many different types of anomalies: changes in the local 
frequency content, changes in local regularity, etc. Furthermore, our theoretical analysis holds 
for finite values of the number of patches N . It is interesting to note that Singer et al. study 
the mean first-passage time between patches extracted from the noisy step function. The 
mean first-passage time is derived from the hitting time, which is used to define the commute 
time. The authors in [37] use an energy argument to explain the existence of a large mean first- 
passage time between patches extracted from either side of the step function's discontinuity. 
They argue that a high density of patches is associated with a lower potential energy, and 
consequently it will take longer for a random process to exit the well with such a low potential. 
Finally, our results are not limited to patches of size d = 1, as are the results in [37]. 

The energy argument in [37] adds an interesting interpretation to our analysis. Following 
this perspective, the slow patches can be interpreted as points sampled from a probability 
density function P defined on W^ with a support that is defined along a low- dimensional 
manifold. This localization leads to a potential U = — log P with a deep and narrow well, 
from which the random walk cannot escape. This argument agrees with our findings that 
the average commute time between slow patches is very large, and thus, the random walk 
spends considerably more time in the slow subgraph before being able to reach a patch that 
is temporally faraway. 

From a more general perspective, this work presents an investigation into the diffusion 
process on the graphs models presented in Section 5.2. Our work is thus related to a large body 
of work on the analysis of complex and random networks using first-passage time (e.g. [15] 
and references therein) . This area if usually motivated by physical problems such as transport 
in disordered media, neuron firing, or energy flow on power-grids instead of applications in 
signal processing. 

7.4. Open questions. While we obtained estimates for the average commute time on the 
fast and slow graph models considered separately, it would be desirable to obtain similar 
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estimates on the fused graph. At the moment, our analysis of the fused graph rehes on 
numerical simulations. We are also aware of a small discrepancy in the upper bound on kjt: 
this bound is increasing with L. In fact we expect that the commute time on J^{N,p) should 
decrease as p, and therefore L, increases. The reason for this apparent inconsistency is that the 
proof of (5.7) relies on a loose upper bound for the effective resistance between two vertices, 
which is provided by the geodesic distance on the graph [10]. This is not a tight inequality on 
a graph such as F[N,p). A more effective inequality, which could improve the upper bound 
(5.7), relies on the computation of the distribution of the number of paths s of length at most 
/ between the two vertices. We could then use the fact that the commute time is bounded 
from above by a constant times the ratio 1/ s [10], which would decrease the upper bound in 
(5.7). 

Appendix A. The connectedness of the fast graph. It is necessary that the fast graph 
T{N,p) be connected to be able to apply the spectral decomposition of the commute time. 
To ensure that the probability of J-{N,p) being disconnected will vanish as A^ gets large, we 
must choose Np > logN [17]. Since p is defined as a function of L in (5.3), any requirement 
on p ultimately constrains L. First, because the maximum degree of a vertex in S{N,L) is 
2L + 1, according to (5.1), we require 

2L + 1 < A^. 
Manipulation of this inequality leads to 

L + 1 1 1 



We assume that A^ > 2, so that 



A^ - 2 2A^ 



L + 1 ^ 3 

N - i' 



L + l\ 5 
2 ^ > T > 1. 



It follows that 

N J - i 
Therefore, rewriting (5.3) and using the last inequality we have 

L ( L + l\ L ( L + l\ L 

Therefore, choosing L = clogA^ for some c > 1 ensures that Np > logA^, and consequently, 
the probability of J-{N, p) being disconnected approaches zero as A^ approaches infinity. 

Appendix B. Bounding the commute times in the graph models. 

B.l. Proof of the lower bound on the average commute time in the slow graph. In 

order to compute a lower bound on the average commute time, we consider a fixed pair of 
vertices in the slow graph, Xno and x^o, and compute a lower bound on the commute time 
/{(x„Q,Xmo)- We can then compute the average of this lower bound over all the pairs of 
vertices. To obtain the lower bound on K(x„(,,Xmo) we use a standard tool to obtain lower 
bounds on commute time: the Nash- Williams inequality [29]. The Nash- Williams inequality 
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is usually formulated in terms of electrical networks. We prefer to present an equivalent for- 
mulation that is directly adapted to our problem. We first introduce the concept of edge-cutset. 

Definition B.l. Let Vi and V2 be two disjoint sets of vertices. A set of edges E is an edge- 
cutset separating Vi and V2 if every path that connects a vertex in Vi with a vertex in V2 
includes an edge in E. 

Given a weighted graph, which may contain loops, we define a random walk with the 
probability transition matrix P„,m = ^n,m/'Dn,n- Let x^p and x„p be two vertices. The 
commute time between vertices x^p and x„q, K(xm,o,x„(,) satisfies the following lower bound. 

Lemma B.2 (Nash-Williams).// x^,, and Xn^^ are distinct vertices in a graph that are sepa- 
rated by disjoint edge-cutsets E^, k = 1, . . ., then 



yY. 



{x„,Xm}e-Bfc 



-1 

< K{xmo,^no) where {xm,x„} is an edge in the cutset E^, 



(B.l) 
and where the volume of the graph is defined by F = Ylii=i Ylij=i '^i,i- 

We now exhibit a sequence of edge-cutsets in the slow graph. We refer to Figure 18 for 
the construction of the cutsets. We define the first cutset Ei. If mo < L, then Ei needs a 
little more attention and is defined as the set of L edges {xj,Xj}, where i and j are defined 

by 

yj = mo + l,...,L + i. 

The edge-cutset Ei is shown in the Figure 18 for tuq = 1 (left) and nriQ = 2 (center), for L = 3. 
The removal of this set of edges prevents x^q from being connected to x^g . Indeed, the self 
loop on the diagonal (green entry) does not allow the random walk to move toward x„,) . This 
can be also be visualized in Figure 19, where Ei is the leftmost set of edges that connect x^q 
to that part of the graph that is connected to x„q . The sum of edge weights in Ei is at most 
L{L -{- l)uis/2. If m-o > L, then Ei, is defined as the other generic edge-cutsets. 

We now define the generic edge-cutsets E^ as the set of L{L + l)/2 edges {xj,Xj} such 
that 

U = mo + l + {k- 2)L, ...,mo + {k- 1)L, 
\j = mo + l + {k-l)L,...,L + i. 

As seen in Figure 18-right for fc = 3, setting the entries of E-^ to zero disconnects the upper and 
lower part of the submatrix W(m,o : no, mo : no), thereby isolating x^g and x„(,. Alternatively, 
we also see in Figure 19 that any path from x^g to x^q needs to go through £"3. Each edge- 
cutset Ek,k > 2 is a triangle with a height of size L. Therefore, after creating Ei, we can fit 
""" j? — - — such cutsets between x^o+i and x„,) . The sum of the weights along the edges 

of each cutset E^, k = 2, ... is given by L{L + l)ws/2. In addition, the sum of edge weights 
in the first cutset Ei is at most L{L + l)ws/2. Putting everything together, the computation 
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of the lower bound using the Nash-Wilhams Lemma yields 



"E 



> [N(2L + l)-L{L + l)]w 
^ [iV(2L + 1) - L(L + 1)] 



> 



(no - mo) 
L 



+ 



L{L + l)ws L{L + l)w, 



L(L + 1) V V L 

[N{2L + 1)-L{L + 1)] /no -mo 



^, no-mo _ 



L{L + 1) 



L 



We can summarize this result in the following lemma. 



Lemma B.3. The commute time between vertices Xn^ and Xm^ inside S{N,L) satisfies 



l^\^moT^nf)) ^ 



2 [N{2L + 1) - L(L + 1)] /no - mo 



L{L + 1) 



L 



(B.4) 



Finally, we bound the average commute time in the slow graph. Observe that the slow 
graph model S{N, L) has N — j pairs of vertices such that |?Ti — n| = j, for j = 1, . . . ,N — 1. 
Therefore, using the lower bound given in Lemma B.3 it follows that 



l<m<n<N ^ ' j=l 



But 



eV-.). ^ (- eV eV) K^^^ - ^^^) 



N{N-l)N + l 



Dividing both sides by N{N — l)/2 and simplifying yields (5.6). 
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Figure 18. The small squares represent the nonzero entries in the upper triangular portion of the weight 
matrix W of S{N,L). The green entries on the diagonal are the self-loops. The edge-cutsets E^ are shown in 
red for mo — 1 (left), mo = 2 (center), and for mo > L (right). The submatrix W(m,o : no, m,o : no) is also 
shown. 
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Figure 19. Top: 
edge-cutset E3. 



qe-cutsets E\ and E3,. Bottom: any path from mo to no needs to use an edge of the 



B.2. Proof of upper bound on the average commute time in the fast graph. Our 

approach relies on the relationship between electrical networks and random walks on graphs 
[16]. We begin by introducing the property of interest — the effective resistance — and its 
relationship to the commute time. 

The electrical network perspective. For each pair of vertices x„ and Xm with a non zero 
weight Wn,m, we assign the resistance 



1 



Wr. 



(B.5) 



to the edge {x„,Xm}. We note that if 'Wn,m = 0, then there is no connection between x„ and 
Xm, and no resistance to consider. Now, consider applying a potential difference, or voltage, 
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across the vertices x^o and x^q. As a result, some current flows across the resistors (edges) in 
the electrical network (graph) . We may replace the set of resistors across which some current 
flows by an equivalent, effective resistance, Rmo,no that is connected between x^q and x^^. 
The effective resistance Rmo,no is defined by the voltage necessary to maintain a one- unit 
current between x^q and x^^. The main result in [10], is that the commute time between 
vertices x^o and x^q can be expressed as 

Taking expectations of both sides of Equation (B.6) with respect to the process of gener- 
ating edges and choosing terminals in a fast graph, we obtain 



K^ = E{V)E{R) + Cov{V,R). (B.7) 

Notice that every edge in the fast graph has weight wj^. Therefore, V can be expressed as 

N 

V = Y,Wnn + 2 Yl Wmi = wrN + 2wrN, (B.8) 

n=l l<m<l<N 

where A^ is a binomial random variable representing the number of edges connecting dis- 
tinct vertices in the fast graph. We now rewrite (B.7), using (B.8) and the assumption that 
Cov(y, R) < 0, to obtain 



i^F < Wjr 



N + 2E{N) E{R) 



Recall that N is distributed as a binomial random variable with parameters {N{N — 
l)/2,p). Also, the effective resistance between two nodes of a network is at most the geodesic 
distance between them, 6, scaled by 1/wjr [10]. It follows that 

K^< [N{N-l)p + N]E{6). 

The authors [20] give a closed form expression for E{6) on Erdos-Renyi graphs, which we 
can utilize since the fast graph's self-connections do not change the geodesic distance. This 
yields 



K^ < [N{N - l)p + A^] 



logAr-7e ^1 



log{{N - l)p + I) 2 

where 7e ~ 0.5772 is Euler's constant. Simplification using (5.3) gives the desired result. 

Remark. Although Cov(l/, i?) < is an assumption, we conjecture that it is always satisfied 
due to the fact that increasing the number of resistors M in an electrical network with a fixed 
number of nodes is effectively like adding resistors-in-parallel, and, according to Rayleigh's 
Monotonicity Law, adding edges (increasing M) can only decrease the effective resistance [16]. 
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Appendix C. Generating a random trigonometric polynomial with a specified autocorre- 
lation. Let z{t) represent a random trigonometric polynomial on [0, 1) with an autocorrelation 
function given by 



C{t) = 2(cos(^t))2/5 - 1 for r G 



1 1 
'2'2 " 



(C.l) 



for some nonnegative integer /?. It follows that we can do a Fourier expansion of C{t) to 
obtain 

C{T) = Y,Cje^"'^', (C.2) 



where i 



-1 and 



C, 



C{r)e 



'2-ITiJT 



dr 







^ I 2(1-2/3) j y^ 1^2/3^ g27ri(/3-fc)r 



^fc=0 



k 



1 e-^'^^^'dr 



^"""sC')r 







2(1-2/3) 



2/3 
/3 



2/3 



1 ifi = 0, 



ifJ>A 



where the second equality follows after expressing cosine with complex exponentials, and 
applying the binomial theorem. 

It is clear that 2/3 is the frequency of the fastest sinusoid making up the random signal z{t), 
and that most of the energy is on average at frequency /3. Let Aj and Bj be independent and 
identically distributed Normal random variables with zero mean and unit variance. Define 




{Aj + iBj) . 



Finally, the signal z{t) is defined as 






^2mjt 



To check that the signal z[t) defined above has the correct autocorrelation, observe that 
linearity of the expectation, independence and zero mean of the random variables, and the 
fact that Cj = C-j together imply that 
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E{z{t)z{t + T))= Y, Yl ^{zjZk)e-^^'^^e^^'^^-^^' 

|j|<2/3|fc|<2/3 



c c 

lil<2/3|fc|<2/3 

lil<2/3 |i|<2/3 



Therefore, referencing (C.2), it follows that E(z(t)z(t + r)) = C(t). 
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